Tunix Gemma Reinforcement Learning

Placeholder

Authors
Affiliation

CJ Jones

Georgetown University

Adam Stein

Georgetown University

1 Abstract

2 Introduction

3 Literature Review

4 Dataset

Fig. 1: Annotated chess board output from ChessReD Dataset
Fig. 2: Chess board with no bounding boxes output from ChessReD Dataset

5 Methods

5.1 Convolutional Feature Extractor

Fig. 3: End-to-end architecture for combined chessboard recognition network

5.2 Training Procedure

Component # Params Trainable @ Start? Trainable @ End
ConvNeXt-B backbone 88 M ❌ frozen ✅ last 3 / 12
Transformer (4 layers) 17 M ❌ frozen ✅ last 1 / 4
Square tokens 64 × 1024 ≈ 66 k
Linear head 13 k
Total 105 M 79 k (0.07%) ~30 M (28.6%)

Fig 4. Summary of model parameter counts, training visibility at start and end of training, and staged unfreezing schedule.


6 Results

6.1 Error Distribution and Exact-Match Accuracy

Metric Baseline ResNeXt (2023) ConvNeXt-TE (+Tx) (Ours)
Mean incorrect squares / board 3.40 4.33
Boards with no mistakes (%) 15.26 9.12
Boards with ≤1 mistake (%) 25.92 19.38
Per-square error rate (%) 5.31 5.94


Fig. 5: Distribution of incorrect squares per predicted board. The model most frequently makes 2–4 errors per board, though outlier examples with 10+ mistakes persist.

6.2 6.2 Confusions

6.3 6.3 Qualitative

Fig. 7: Example of model prediction on a validation board. From left to right: original image, ground-truth board matrix, model-predicted matrix. The predicted board achieves a per-square accuracy of 0.984.

6.4 6.4 Training Dynamics

Fig. 8: Training and validation loss curves across all epochs. Smooth convergence reflects the stability of the training setup and the effectiveness of progressive unfreezing.

7 Discussion

8 References

Masouris, Athanasios, and Jan C. van Gemert. End-to-End Chess Recognition. Delft: Delft University of Technology, 2023. https://github.com/ThanosM97/end-to-end-chess-recognition.